-
Notifications
You must be signed in to change notification settings - Fork 390
native_lib/trace: fix and reenable #4435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
37fbac4
to
d2ed49d
Compare
The only options I see for making the builds work nicely regardless of arch are either (a) copy-pasting a massive |
0110f5a
to
6907e9d
Compare
It should be entirely possible to do this with a single |
src/shims/native_lib/mod.rs
Outdated
//#[cfg(target_os = "linux")] | ||
//pub mod trace; | ||
#[cfg(trace)] | ||
pub mod trace; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce the amount of cfg
in this file, I would suggest something like this:
#[cfg_attr(not(all(target_os = "linux", any(target_arch = "x86_64", ...), path = "trace/stub.rs")]
mod trace;
and then the stub.rs
file contains just a bunch of empty functions and dummy type aliases so that to the outside, it looks like all the same functions and types exist.
src/alloc/mod.rs
Outdated
@@ -1,5 +1,5 @@ | |||
mod alloc_bytes; | |||
#[cfg(target_os = "linux")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just keep using that allocator on all Linux targets, no harm in that.
@rustbot author |
Reminder, once the PR becomes ready for a review, use |
3b1584e
to
93015d6
Compare
9e54dab
to
f2eb0e3
Compare
60837c7
to
f411f4f
Compare
2f028a6
to
9e8e277
Compare
Seems like CI is fine. @rustbot ready |
src/shims/native_lib/trace/child.rs
Outdated
// This might be desired behaviour, as even on platforms where ptracing | ||
// is not implemented it enables us to enforce that only one FFI call | ||
// happens at a time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not very reassuring if the code says "might be desired behavior".^^
// This might be desired behaviour, as even on platforms where ptracing | |
// is not implemented it enables us to enforce that only one FFI call | |
// happens at a time. | |
// As a side-effect, even on platforms where ptracing | |
// is not implemented, we enforce that only one FFI call | |
// happens at a time. |
It's kind of odd though that we would do this for "Linux without ptrace", but not do it on macos...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside of many-seeds performance-sensitive scenarios, I find it hard to imagine when we wouldn't be okay with this blocking. I do recall running into a concrete case where this was desired; the ptr_write_access
test will write to statics, so performing it in in many-seeds mode without the tracing caused it to randomly segfault (but it was fine with, as it forced the threads to take turns)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm fine with the blocking.
But then the do_ffi
implementation in trace/stub.rs
should also do it to ensure feature parity. :)
src/shims/native_lib/trace/parent.rs
Outdated
const WAIT_FLAGS: wait::WaitPidFlag = | ||
wait::WaitPidFlag::from_bits_truncate(libc::WUNTRACED | libc::WEXITED); | ||
wait::WaitPidFlag::WUNTRACED.union(wait::WaitPidFlag::WEXITED); | ||
|
||
/// Arch-specific maximum size a single access might perform. x86 value is set | ||
/// assuming nothing bigger than AVX-512 is available. | ||
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))] | ||
const ARCH_MAX_ACCESS_SIZE: usize = 64; | ||
/// The largest arm64 simd instruction operates on 16 bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any citation or so you can give for this?
Seems pretty important that we don't get this wrong.^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a comment explaining the rationale, but there isn't really a canonical set of flags to use for this that I know of. WEXITED
makes sure we actually kill the child process when it exits instead of leaving it as a zombie, and WUNTRACED is mostly helpful at the very start before we've ptraced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, I think you misunderstood my comment.^^ It refers to this:
The largest arm64 simd instruction operates on 16 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! oops. Yes, I'll add a citation
ef187be
to
199bf13
Compare
a06efe1
to
f17c0ce
Compare
@rustbot ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rebased and pushed a commit with some cleanups, please check if those make sense.
src/shims/native_lib/trace/parent.rs
Outdated
@@ -417,71 +412,13 @@ fn handle_segfault( | |||
if acc_ty.is_writable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it actually happen that the same operation is both a read and a write? Is that used for instance for atomic fetch-and-add and operations like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know which ops exactly (I know on arm64 there's an explicit swap register with memory op for example), but yes there are x86 ops that do both. The write isn't always done on all ops; per iced
which I was using before and is explicitly x86-focused so I'll trust as a source, the possible options are Read, CondRead, Write, CondWrite, ReadWrite, and ReadCondWrite. Since this only triggers when an access occurs, we can treat CondRead and CondWrite the same as their normal versions bc we know the condition was met, and putting a write after the read on a ReadCondWrite if the condition wasn't met is still fine in the sense that we don't lose information, just might accidentally mark uninit stuff as init.
We could maybe get around this with another mempr_*
function so that instead we first set perms to read instead of RW, and if it retriggers a segfault we know for sure the write happens, but that might add a good bit of code complexity. It's an idea I've had for a bit though and plan on implementing at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh so these events are still approximate, we don't know if both a read and a write actually happened? Good to know, we have to keep that in mind when interpreting the events. That reduces the benefit of having precise range information...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that's only on the RCW events. I can add a boolean is_certain
field on the write and if it's false we double-check that the allocation it's writing to is mutable first, but that will be a very rare edge case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is UB to run compare_exchange
on an immutable allocation even if it ends up not writing anything. So that particular check should be fine, if it only affects such atomic operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll pour over the iced source later then, just to see which instructions that flag is set for. Ty!
They all make sense, yeah. Figured I could leave the correct bits of the arm64 code as long as they're disabled, but I'll just add them all back at once in a later PR ^^ |
I'd rather not have code in the repo that doesn't even get built. |
Co-Authored-By: Ralf Jung <post@ralfj.de>
Per discussion in various PRs (#4401, #4405), this addresses (some of) the points raised. Also using this to check if CI passes on all architectures.